Variance reduction in large graph sampling

نویسندگان

  • Jianguo Lu
  • Hao Wang
چکیده

The norm of practice in estimating graph properties is to use uniform random node (RN) samples whenever possible. Many graphs are large and scale-free, inducing large degree variance and estimator variance. This paper shows that random edge (RE) sampling and the corresponding harmonic mean estimator for average degree can reduce the estimation variance significantly. First, we demonstrate that the degree variance, and consequently the variance of the RN estimator, can grow almost linearly with data size for typical scale-free graphs. Then we prove that the RE estimator has a variance bounded from above. Therefore, the variance ratio between RN and RE samplings can be very large for big data. The analytical result is supported by both simulation studies and 18 real networks. We observe that the variance reduction ratio can be more than a hundred for some real networks such as Twitter. Furthermore, we show that random walk (RW) sampling is always worse than RE sampling, and it can reduce the variance of RN method only when its performance is close to that of RE sampling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Combining Graph-based Variance Reduction schemes

In this paper, we consider two variance reduction schemes that exploit the structure of the primal graph of the graphical model: Rao-Blackwellised w-cutset sampling and AND/OR sampling. We show that the two schemes are orthogonal and can be combined to further reduce the variance. Our combination yields a new family of estimators which trade time and space with variance. We demonstrate experime...

متن کامل

Combined Correlated and Importance Sampling in Direct Light Source Computation and Environment Mapping

This paper presents a general variance reduction method that is a quasi-optimal combination of correlated and importance sampling. The weights of the combination are selected automatically in order to keep the merits of both importance and correlated sampling. The proposed sampling method is used for efficient direct light source computation of large area sources and for the calculation of the ...

متن کامل

Stochastic Training of Graph Convolutional Networks with Variance Reduction

Graph convolutional networks (GCNs) are powerful deep neural networks for graph-structured data. However, GCN computes the representation of a node recursively from its neighbors, making the receptive field size grow exponentially with the number of layers. Previous attempts on reducing the receptive field size by subsampling neighbors do not have a convergence guarantee, and their receptive fi...

متن کامل

Variance Reduction Techniques for Estimating Value-at-Risk

T paper describes, analyzes and evaluates an algorithm for estimating portfolio loss probabilities using Monte Carlo simulation. Obtaining accurate estimates of such loss probabilities is essential to calculating value-at-risk, which is a quantile of the loss distribution. The method employs a quadratic (’’delta-gamma’’) approximation to the change in portfolio value to guide the selection of e...

متن کامل

Electron. Commun. Probab. 20 (2015), no. 15, DOI: 10.1214/ECP.v20-3855

In recent papers it has been demonstrated that sampling a Gibbs distribution from an appropriate time-irreversible Langevin process is, from several points of view, advantageous when compared to sampling from a time-reversible one. Adding an appropriate irreversible drift to the overdamped Langevin equation results in a larger large deviations rate function for the empirical measure of the proc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 50  شماره 

صفحات  -

تاریخ انتشار 2014